Using mixed methods and partnership to develop a program evaluation toolkit for organizations that provide physical activity programs for persons with disabilities

Background The purpose of this paper is to report on the process for developing an online RE-AIM evaluation toolkit in partnership with organizations that provide physical activity programming for persons with disabilities. Methods A community-university partnership was established and guided by an integrated knowledge translation approach. The four-step development process included: (1) identify, review, and select knowledge (literature review and two rounds of Delphi consensus-building), (2) adapt knowledge to local context (rating feasibility of outcomes and integration into online platform), (3) assess barriers and facilitators (think-aloud interviews), and (4) select, tailor, implement (collaborative dissemination plan). Results Step 1: Fifteen RE-AIM papers relevant to community programming were identified during the literature review. Two rounds of Delphi refined indicators for the toolkit related to reach, effectiveness, adoption, implementation, and maintenance. Step 2: At least one measure was linked with each indicator. Ten research and community partners participated in assessing the feasibility of measures, resulting in a total of 85 measures. Step 3: Interviews resulted in several recommendations for the online platform and toolkit. Step 4: Project partners developed a dissemination plan, including an information package, webinars, and publications. Discussion This project demonstrates that community and university partners can collaborate to develop a useful, evidence-informed evaluation resource for both audiences. We identified several strategies for partnership when creating a toolkit, including using a set of expectations, engaging research users from the outset, using consensus methods, recruiting users through networks, and mentorship of trainees. The toolkit can be found at et.cdpp.ca. Next steps include disseminating (e.g., through webinars, conferences) and evaluating the toolkit to improve its use for diverse contexts (e.g., universal PA programming). Supplementary Information The online version contains supplementary material available at 10.1186/s40900-024-00618-7.

Plain English summary Organizations that provide sport and exercise programming for people with disabilities need to evaluate their programs to understand what works, secure funding, and make improvements.However, these programs can be difficult to evaluate due to lack of evidence-informed tools, low capacity, and few resources (e.g., money, time).For this project, we aimed to close the evaluation gap by creating an online, evidence-informed toolkit that helps organizations evaluate physical activity programs for individuals with disabilities.The toolkit development process was guided by a community-university partnership and used a systematic four-step approach.
Step one included reviewing the literature and building consensus among partners and potential users about indicators related to the success of community-based programs.
Step two involved linking indicators with at least one measure for assessment.
Step three involved interviews with partners who provided several recommendations for the online toolkit.
Step four included the co-creation of a collaborative plan to distribute the toolkit for academic and nonacademic audiences.Our comprehensive toolkit includes indicators for the reach, effectiveness, adoption, implementation, and maintenance of physical activity programs for individuals with disabilities.This paper provides a template for making toolkits in partnership with research users, offers strategies for community-university partnerships, and resulted in the co-creation of an evidence-informed evaluation resource to physical activity organizations.Users can find the toolkit at et.cdpp.ca.

Disability and physical activity
The United Nations Convention on the Rights of Persons with a Disability protects the rights of people living with disabilities to access full and effective participation in all aspects of life, including sports and other recreational forms of physical activity (PA) such as exercise and active play.But because of countless environmental, attitudinal and policy barriers [1], children, youth and adults with disabilities are the most physically inactive segment of society [2,3].Physical inactivity increases the risk that people with disabilities will experience physical and mental health conditions, social isolation, and stigma [4].Systematic reviews have evaluated the effects of participation in PA programs among children, youth, and adults with physical, intellectual, mental, or sensory disabilities.Many, but not all, of these reviews have reported significant improvements in physical health, mental health, and social inclusion [2].One reason for the inconsistent outcomes is that the PA participation experiences of people with disabilities are not universally positive [5].
Qualitative and quantitative research shows that people with disabilities often report negative PA experiences; for instance, being marginalized, excluded, and receiving sub-standard equipment, access, instruction, and opportunities to fully participate in PA [6][7][8].Research and theorizing on quality PA participation and disability indicate that these low-quality PA experiences deter ongoing participation and undermine the potential physical and psychosocial benefits of PA for children and adults [5,9].These findings attest to the need for evaluation of existing PA programs to identify what is working, and where improvements are needed to achieve optimal participation and impact.

Evaluating community-based programs
Persons with disabilities increasingly participate in disability sport to be physically active, and disability sport is often delivered by community organizations [2].Like many community-based and non-profit organizations, organizations that provide PA programming for persons with disabilities (herein referred to as 'this sector') are often expected to conduct evaluations.These evaluations are done to secure and maintain external funding, demonstrate impact to board members and collaborators, and understand capacity for growth [10].Even though program evaluations are often required, real-world programs are difficult to evaluate [11] and organizations often lack capacity and resources to conduct evaluations effectively [12].Programs may be difficult to evaluate due to program complexity (e.g., setting, target population, intended outcomes; [11], and evaluation priorities (e.g., differing partner needs and resources; [13].Organizations may lack capacity in understanding and using appropriate evaluation methods and tools [14], determining what counts as evidence and its application [15], and the roles of researchers and practitioners in supporting real-world program evaluations [16]. Evaluation frameworks can be used to facilitate a guided, systematic approach to evaluation.A framework involves an overview or structure with descriptive categories, meaning they focus on describing phenomena and how they fit into a set of categories rather than providing explanations of how something is working or not working [17].One evaluation framework that is commonly applied in PA and disability settings is the RE-AIM framework [18].RE-AIM is comprised of five evaluation dimensions or categories: (a) Reach: the number, proportion, and representativeness of individuals who engage in a program, (b) Effectiveness: the positive and negative outcomes derived from a program, (c) Adoption, the number, proportion, and representativeness of possible settings and staff participating in the program, (d) Implementation: the cost and extent to which the program was intended to be delivered, and (e) Maintenance: the assessment beyond six months at the individual and organizational levels.The RE-AIM framework is appropriate in this sector because it aligns with organizations' need to understand factors that influence PA participation at both individual and organizational levels and for process (formative) and outcome (summative) evaluations [19][20][21][22][23]. Additionally, the RE-AIM framework has demonstrated feasibility to evaluate programs in this sector [19,21,22].The RE-AIM framework was developed to address the failures and delays of getting scientific research evidence into practice and policy [18].

Gaps between evaluation research and practice
There has been a growing body of evidence to suggest that one of the most effective ways to bridge the gap between research and practice is through integrated knowledge translation (IKT; [24]).IKT means that the right research users are meaningfully engaged at the right time throughout the research process [25].IKT involves a paradigmatic shift from recognizing researchers as 'experts' to valuing the expertise of individuals with lived experience, programmers, and policymakers through their inclusion in the development of the research questions, methods, execution, and dissemination to ensure that the research is relevant, useful, and usable [25].A commitment to IKT aligns with the "nothing about us without us" philosophy of the disability rights movement [26] and is therefore ideal for a toolkit development process for this sector.
To address the gaps of lack of evidence-informed resources and reduced organizational capacity to conduct program evaluations [12], our community partners (leaders from seven Canadian organizations in this sector) identified that a toolkit is needed.An evaluation toolkit is a collection of tools that includes materials that may be used individually or collectively, such as educational material, timelines, and assessment tools, and the tools may often be customized based on context, thus helping to bridge the translation gap between evidence and practice [27].Toolkit development can be a multistep process including literature reviews, interviewing partners, and using a Delphi approach [27].Previous research with community-based disability PA organizations suggests that digital platforms can be an efficient way for participants and staff to provide evaluation access to evaluation tools [19,23].Together, this research culminated in our decision to (1) use RE-AIM for the toolkit's framework, meaning the toolkit was organized using the five evaluation dimensions, and (2) to deliver the toolkit through interactive technology.The purpose of this paper is to report on a systematic, IKT-focused process for the design, development, and formulation of implementation considerations for an online RE-AIM evaluation toolkit for organizations that provide PA programming for persons with disabilities.

Research approach
A community-university partnership was established between seven Canadian disability PA organizations and three universities.A technology partner guided the back-end development of the online toolkit.Using an IKT approach [25], community partners were engaged before the research grant was written and submitted to ensure that the project was meaningful and focused on the appropriate tasks and outcomes.To guide our partnership, we agreed to adopt the IKT guiding principles for SCI research [25] which aim to provide a foundation for meaningful engagement between partners.An example of a guiding principle is partners share in decision-making [25].The principles were presented at each bi-monthly team meeting and participants had the opportunity to share concerns if certain principles were not upheld.Partners had regular opportunities for sharing in decision making, provided financial contributions to accelerate the project, and benefitted from developing the toolkit to tailor indicators and measures relevant for disability PA organizations.Two community partner leaders also provided mentorship to academic trainees on community engagement in research, employment in non-academia, and project management, emphasizing the multi-directional nature of the partnership.To see the entire IKT process, see Appendix A in the supplemental file.
To maximize the likelihood that our toolkit is used in practice, our development process was guided by the Knowledge-to-Action (KTA) framework (see Fig. 1; [28]).The KTA framework was developed to help researchers with knowledge translation by identifying the steps in moving knowledge into action [28].The KTA framework has two components: (a) knowledge creation and (b) action cycle.Our toolkit development process followed the steps of the action cycle, whereby existing knowledge is synthesized, applied, and mobilized.The problem to be addressed is a need for a program evaluation toolkit.To solve the problem, as shown with the yellow boxes in Fig. 1, the steps for developing the RE-AIM evaluation toolkit included: (1) identify, review, and select knowledge; (2) adapt the knowledge to the local context and users; (3) assess the barriers and facilitators to knowledge use; and (4) select, tailor, and implement the toolkit.
To guide toolkit development, we ensured the methods aligned with recommendations from the COnsensus-based Standards for the selection of health Measurement Instruments/ Core Outcome Measures in Effectiveness Trials (COSMIN/COMET) groups for generating a set of core outcomes to be included in health intervention studies [29].These guidelines state that developing a core outcome set requires finding existing outcome measurement instruments (see Step 1), quality assessment of instruments (see Step 2), and a consensus procedure to agree on the core outcome set (see Step 2) [29].
Step 1: Identify, review, and select knowledge

Literature review
The first step in identifying, reviewing, and selecting knowledge was to conduct a literature review.The literature review examined research using the RE-AIM framework to evaluate community-based and healthrelated programs.This was completed through a search of www.re-aim.org (which lists all RE-AIM evaluations) to identify indicators for each RE-AIM dimension within community-based and health-related contexts.Studies were included if they: used the RE-AIM framework to evaluate a community-based health program or involved persons with disabilities, were published in English, and were peer reviewed.All study designs were included.The review also examined qualitative and quantitative studies of outcomes of community-based PA programs for people with disabilities (e.g., [9]) and outcomes our own partners have used in their own program evaluations.These papers and outcomes were discussed and chosen during early partnership meetings to initiate a list of indicators.Examples of community-based programs included peer support programs for individuals with spinal cord injuries in Quebec.Data extracted from papers included indicators (and their definitions) and associated measures used for evaluations.

Delphi process
The second part in identifying, reviewing, and selecting knowledge involves critically appraising the relevant literature identified, to determine its usefulness and validity for addressing the problem [28].To determine usefulness and validity, a consensus-building outreach activity was used-an online Delphi method.Briefly, the Delphi method is used to arrive at a group decision by surveying a panel of experts [30,31].The experts each respond to several rounds of surveys.Survey responses are synthesized and shared with the group after each round.The experts can adjust their responses in the next round based on their interpretations of the "group response." The final response is considered a true consensus of the group's opinion [30,31].Delphi was ideal for our partnership approach because it eliminates power dynamics from the consensus-building process and ensures every expert's opinion is heard and equally valued.Previous research has demonstrated the utility of Delphi methods to generate consensus among disability organizations regarding the most important outcomes to measure in a peer-support program evaluation tool [32].
Delphi methodologies are considered a reliable means for achieving consensus when a minimum of six experts are included [33].Therefore, we aimed to recruit a minimum of six participants from each target group (i.e., members of disability PA organizations and researchers).Partners were encouraged to invite members who may qualify and be interested in completing the Delphi process.Participants completed a two-round Delphi process and were asked to rate each RE-AIM indicator on a scale of 1 (not at all important) to 10 (one of the most important).An indicator was included if at least 70% of participants agreed it was "very important" (8 or above) [31].Indicators that did not meet these criteria were removed from the list.
Retained indicators were then paired with at least one possible measure of that indicator (e.g., the 'Positive Youth Development' indicator was paired with the Out-of-School Time Observation instrument [34]).The partnership's goal was to develop a toolkit comprised of valid and reliable measures.Therefore, the validity and reliability of each measure were critically appraised by the academic team-members using COSMIN/COMET criteria [29].For some 'Effectiveness' indicators, published questionnaires were identified from the scientific literature.Measures were retained if they had high quality evidence of good content validity and internal consistency reliability [29] and were used in PA contexts and/or contexts involving participants with disabilities.The measures of all other indicators (where no published questionnaire measure was identified) were assessed by nine partners and modified to ensure that the measure was accurate and reliable for evaluation use in this sector.

Step 2: Adapt knowledge to local context
In the KTA framework, this phase involves groups making decisions about the value, usefulness, and appropriateness of knowledge for their settings and circumstances and customizing the knowledge to their particular situation [28].Using Microsoft Excel, partners were sent a list of the selected indicators and measures in two phases (Phase 1: "RE" indicators and Phase 2: "AIM" indicators).Partners were asked to rate, on a scale of 0 to 2 the following categories for each measure: feasibility-time (not at all feasible to feasible), feasibility-complexity (not at all feasible to feasible), accuracy (not at all accurate to accurate), and unintended consequences (no, maybe, yes).They were also asked to provide additional feedback.This step only involved partners on the project with experience administering questionnaires (in research or evaluation settings) because the process required knowledge of how to administer measures to respondents.The median and mean of each category were calculated with community partner responses given double weighting/value relative to academic partner responses.Double weighting was given to community partner responses as the toolkit is anticipated to be used more frequently in community settings.The feedback was summarized.Results were presented to all partners during an online meeting, and team members discussed feedback to establish agreement on measures.The measures were sent out to partners again to provide any final feedback on included indicators and measures.The selected indicators and measures were compiled in an online program evaluation toolkit compliant with accessibility standards.

Step 3: Assess barriers and facilitators
In the KTA framework, this step involves identifying potential barriers that may limit knowledge uptake and supports or facilitators that can be leveraged to enhance uptake [28].In Step 3, partners were invited to participate in an unstructured, think-aloud interview while they used the online program evaluation toolkit [35].Interviews were conducted to collect detailed data about how users reacted to different parts of the toolkit content, format, and structure.Each interview was conducted over Zoom with one participant and two interviewers.The two-to-one interview format [36] supported the ability to take notes during the interview, ask questions from different perspectives, and reflect on common experiences to the two interviewers [36] with the website.Participants were also asked how the toolkit was used and any barriers to its use, and identified features of the toolkit that may need to be changed.In a separate group meeting, team members were asked for ideas on how to overcome potential barriers to using the toolkit and tips for its implementation.Data were analyzed using a content analysis approach [37] and recommendations were prioritized by the lead and senior authors using the MoSCoW method [38].The MoSCoW method is a prioritization technique that has authors categorize recommendations using the following criteria: (a) "Must Have" (Mo), (b) "Should Have" (S), (c) "Could Have" (Co), and (d) "Won't Have This Time" (W).These recommendations were presented to all partners for further discussion.Based on the feedback, the toolkit content and technology were further iterated as needed.Information from this step was used to write brief user guides for toolkit users.
Step 4: Select, tailor, implement In the KTA framework, this step involves planning and executing interventions to promote awareness and implementation of knowledge, and tailoring interventions to barriers and audiences [28].In Step 4, during an online partnership meeting, a brainstorming activity was completed to discuss target audiences for the toolkit, barriers and facilitators to outreach, and dissemination ideas.Team members formulated a dissemination plan and identified promotional resources they need to tailor the dissemination of the toolkit to their sector networks.

Results
Step 1: Identify, review, and select knowledge

Literature Review
The initial searching process on the re-aim.orgdatabase identified 15 papers with relevant indicators for a RE-AIM toolkit.These papers and their citations are in Appendix B in the supplemental file.Additional resources identified by partners included: [2,9,39,40], and partners' previous experiences with evaluations to inform potential indicator choices.In total, 62 indicators were identified across all RE-AIM domains.

Delphi process
In round 1, 32 people participated in the exercise (two participants did not provide demographic information).In round 2, 28 people completed the questionnaire (four participants did not provide demographic information).Detailed participant demographics are presented in Table 1.The adaptation of indicators through the Delphi process can be found in Fig. 2. Given that nearly all indicators were deemed important from round 2, we agreed that a third round of the Delphi process was not needed.Based on the literature review, measures for each indicator were identified.

Step 2: Adapt knowledge to local context
Eight partners (n = 3 academic, n = 5 community) completed the rating process for the "RE" domains and 10 partners (n = 3 academic, n = 7 community) completed the rating process for the "AIM" domains (rating feasibility, complexity, accuracy, and unintended consequences; see Table 2).Respondent feedback was used to adapt and improve the measures to make them more feasible, less complex, and more accurate to reflect the indicators properly.Respondents also suggested that each measure should also include information boxes about the respondents, administrators, type of data collection, and time to complete data collection.The adaptation of indicators and measures from this process can be found in Fig. 2. The final list of indicators and measures can be found in Table 3.

Step 3: Assess barriers and facilitators
Six partners (community and academic partners) participated in unstructured think-aloud interviews, one of which was conducted jointly with two partners (M time = 43.37,SD ± 13.50 min).Across interviews, 45 unique recommendations were identified for improving the usability of the toolkit.These recommendations were sorted using the MoSCoW method, and prioritized based on budgetary constraints, team skillsets, and competing needs.Of the 45 recommendations, 30 were identified as 'Must haves' , 6 as 'Should haves' , 4 as 'Could haves' , and 5 as 'Won't haves' (see Appendix C in the supplemental file).All 30 'Must have' recommendations were implemented in collaboration with the technology partner, along with 2 'Should have' recommendations.
Step 4: Select, tailor, implement After all recommendations were executed by the technology partner, a final project meeting was held to discuss project updates, barriers and facilitators to outreach, and

Independence
One's ability to be self-sufficient "I am more independent" (1-item) Survey (Fixed)

Life satisfaction
Feelings of satisfaction with one's own life, including self-satisfaction and satisfaction with different aspects of life NIH Toolbox -General Life Satisfaction (5-items) (46) Survey (Fixed) Meaning and purpose Meaning in life refers to the feeling that one's life and experience make sense and matter.Purpose refers to the extent to which one experiences life as being directed, organized, and motivated by important goals NIH Toolbox -Meaning and Purpose (7-items) [46] Survey (Fixed)

Confidence
Feeling self-assured about one's own qualities/ capabilities 1-item measure: "I am ______ that I can accomplish most things I set out to do." On a scale from: Much less confident, to Much more confident [47] Survey (Fixed) Self-efficacy A person's feelings of control over their life, as well as their confidence in being able to manage their own functioning and deal effectively with situations and demands Domain or task specific self-efficacy: 1-item measure: "How confident are you in your ability to…." followed by specific skill: 10-point Likert scale anchored by not at all confident and extremely confident.[48] General Self-Efficacy (4-items) [49] Survey (Fixed)

Resilience
The ability to bounce back and recover from difficult/ stressful situations Brief Resilience Scale (6-items) [50] Survey (Fixed) Quality of Life An individual's perceptions of their position in life and in relation to their goals, expectations, standards, and concerns.This concept incorporates a person's physical health, psychological state, level of independence, social relationships, personal beliefs, and their relationships to salient features of the environment WHO QoL BREF (26-items) [51] Survey (Fixed)

Social support
Perceived and received support that is intended to enhance the well-being and positive outcomes of the recipient Social Support and Exercise Survey (13-items) [52] Survey (Fixed)

Sport and community inclusion
Making sure everybody has the same opportunities to participate in every aspect of sport and community to the best of their abilities and desires.Inclusion requires making sure that adequate policies and practices are in effect in a community or organization Social Inclusion Measure (12-items) [53] Survey (Fixed)

Physical activity
Any bodily movement produced by skeletal muscles that requires energy expenditure.Physical activity refers to all movement including during leisure time, for transport to get to and from places, or as part of a person's work International Physical Activity Survey: 7-items, estimates days/ week, hours/day, and minutes/day of intensity of physical activity and sitting time [54] Survey (Fixed)

Health
The overall evaluation of one's physical and mental health Global Health Scale (7 to 10-items) [55,56] Survey (Fixed) Physical self-concept A combination of individual self-perceptions related to physical appearance, athletic abilities, and physical capacities Physical Self-Description Survey Short Form (47-items) [57] Survey (Fixed)   ideas for dissemination.Barriers to outreach included lack of research or evaluation knowledge to use the toolkit, lack of funding to conduct evaluations, poor turnover from reaching users (i.e., users becoming aware of the toolkit) to receiving (i.e., users browse the toolkit website) to using the toolkit (i.e., users use the toolkit for an evaluation), and challenges connecting with hard-toreach organizations.Facilitators to outreach included providing resources for evaluation support, connecting with trainees to support evaluations, having positive self-efficacy and attitude for conducting evaluation, building awareness on the benefits of the toolkit through a dissemination campaign, credibility in the toolkit development process, and reaching out to key funders for administration of toolkit as guidance.
The toolkit can be found at et.cdpp.caand is intended to be used by community organizations and academic institutions that conduct program evaluations involving PA and disability (and inclusive integrated programming).This interactive toolkit allows users to customize to their program evaluation situation by selecting a) which RE-AIM dimensions they want to evaluate, and b) which indicators they want to measure within a particular RE-AIM dimension (e.g., self-efficacy and quality participation within the Effectiveness dimension).Based on users' selections, the toolkit program compiles the corresponding measures for each indicator into a customized, downloadable document that the user can then put in the format of their choosing (e.g., online survey, paper questionnaire) for their program evaluation.This design aligns with partner requests for a simple online interface that provides flexibility and tailoring to their program evaluation needs.The toolkit and user guides are made freely available (i.e., open access), to maximize accessibility to community organization and academic audiences.
A plan with dissemination and capacity building activities was created to ensure the supported uptake of the evaluation toolkit.Our priority was to create a knowledge translation and communications package (e.g., newsletter article, social media content) for community partner organizations to disseminate through their channels.This included disseminating information to other community organizations within their network and funding partners (e.g., Sport Canada, Canadian Tire Jumpstart, ParticipACTION, provincial ministries, and the Canadian Paralympic Committee).This package served as the official 'launch' of the evaluation toolkit on July 20, 2023.Through this package, other activities were listed as potential 'services' interested parties can use.These services include bookable time for 'office hours' whereby a one-on-one meeting on how to use the toolkit and conduct program evaluation can be arranged and a 1-h 'frequently asked questions' webinar/workshop.Other activities included publishing an open-access manuscript, writing knowledge translation and media blogs about the manuscript, and delivering academic and community conference presentations.

Discussion
The purpose of this paper was to report on the process of developing an evaluation toolkit in partnership with organizations that provide PA programming for persons with disabilities.Informed by the RE-AIM framework [18] and the knowledge-to-action framework [28], the toolkit development process involved a literature review, Delphi process, and interviews to adapt indicators and measures.Recommendations from partners were implemented, and the final toolkit can be found at et.cdpp.ca.Partners collaborated to create a dissemination and capacity building plan to support the uptake of the toolkit across the target audience.
Community organizations struggle to conduct program evaluations and to use existing evaluation frameworks.A recent scoping review identified 71 frameworks used to evaluate PA and dietary change programs [41].Despite access to many frameworks, Fynn et al. [41] found limited guidance and resources for using the frameworks.In response to these concerns, the toolkit acts as a resource for using the RE-AIM framework by facilitating the uptake of evidence-informed evaluation practices.The toolkit will help organizations overcome barriers to evaluation identified by previous research by increasing capacity to use appropriate methods and tools [14] and providing education on determining what counts as evidence and data [15].This can facilitate better organizational direction, improved programming, and importantly, better quality PA experiences for individuals with disabilities.The toolkit also complied with accessibility standards, an important benchmark for our partnership and a necessary step when creating a product for organizations that serve persons with disabilities.Accessibility standards were relatively easy to achieve and should be customary in all IKT activities.
To the best of our ability, the toolkit was developed specifically for organizations that provide programming for people with disabilities by focussing the literature review, having program partners in the disability community participate in the Delphi process, and ensuring the validity and reliability of indicators in disability contexts.However, there is an enormous shortage of data related to PA and disability as most national health surveillance systems exclude or do not measure disability [2].While this general limitation may affect the toolkit, it also means that the toolkit may be useful for universal PA organizations that are interested in evaluating programs with non-disabled individuals.Additional research is needed to examine the effectiveness of the toolkit in diverse contexts.
This project provides a template for developing openaccess, online evidence-informed toolkits using an IKT approach with community partners.There are few resources on how to develop toolkits for the health and well-being field informed by knowledge translation frameworks or that include perspectives of end-users (e.g., [42,43]).The four-step mixed-methods approach was guided by the systematic use of frameworks to inform toolkit development.Our project utilized a rigorous, step-by-step process for creating toolkits and resources for this sector that centres the knowledge and expertise of research users.To centre the knowledge and expertise of research users, we employed several strategies identified by Hoekstra et al. [44] for building strong disability research partnerships.Important strategies for partnership when developing a toolkit include (1) using a set of norms, rules, and expectations, (2) engagement of research users in the planning of research, (3) using consensus methods (i.e., Delphi), and (4) recruiting research users via professional or community networks [44].
First, we used the IKT Guiding Principles [25] as the set of norms, rules, and expectations to guide our partnership.These principles were addressed throughout the partnership and provided criteria to understand the success of the partnership.Second, we engaged with community partners from the beginning of the research process.Working with community partners who were committed to developing a highquality product was integral to the success of this project.Community partners were committed and highly engaged as the toolkit stemmed from a community-identified need, rather than solely a 'research gap' .Third, using consensus methods is an excellent strategy to avoid decision-making that is dominated by certain voices or interests in the partnership [45].One way that our project allowed for multiple voices to be heard was through our anonymous Delphi processes, which encouraged partners to share their input in a non-confrontational and data-driven manner.Fourth, in our partnership, many individuals and organizations had longstanding working relationships and aligned priorities for the project.Building our partnership based on previous trusting, respectful relationships was essential and using the IKT guiding principles [25] ensured that we maintained similar values and priorities throughout the partnership.
We used an additional strategy that has not been previously mentioned in the IKT literature: mentorship of research trainees by community partners.Through monthly meetings, two community partners provided mentorship sessions to three trainees.These sessions focused on how to close the research-to-practice gap and helped to facilitate strong relationships between researchers and research users.Mentorship was an important step for training the next generation of researchers to use IKT.

Limitations
This project has some limitations.First, an exhaustive systematic scoping review was not conducted to identify evaluation indicators.This may have limited the number of relevant evaluation indicators included in the Delphi surveys.However, given that only five indicators were removed, and none were added after two rounds of Delphi, we are confident that our search returned relevant indicators.In the future, it may be worthwhile to consider an in-person or video-conferencefacilitated Delphi process to encourage discussion and differentiation of indicators.Second, we identified several barriers and facilitators for using the toolkit, but addressing these barriers meaningfully was beyond the scope of this paper.We are currently in the process of disseminating (e.g., social media campaigns, blogs, discussions with funders) and evaluating the toolkit (e.g., surveys, using data analytics).This data will be reported in a future paper.Third, the interviews revealed 45 unique recommendations for the website and toolkit, but only some of these recommendations could be implemented due to budgetary constraints (e.g., adding a search function and filtering indicators to the website could not be completed).

Conclusions
In summary, this paper reports on the development of an online, open-access program evaluation toolkit for the disability and PA sector.The toolkit is informed by the RE-AIM framework [18] and available at et.cdpp.ca.Our paper describes a four-step process guided by the KTA framework [28] and IKT principles [25] to work with community partners to ensure the toolkit is relevant, useful, and usable.The process included reviewing the literature, building consensus through two rounds of Delphi surveys, rating the feasibility and complexity of measures, assessing barriers and facilitators through think-aloud interviews, and crafting a dissemination and capacity-building plan.This paper provides a template for creating toolkits in partnership with research users, demonstrates strategies to enable successful community-university partnerships, and offers an evidence-informed evaluation resource to organizations that provide PA programming for persons with disabilities.

Fig. 2
Fig. 2 Adaptation process for indicators and measures from the Delphi process and partner feedback during COSMIN/COMET rating

Table 1
Demographic details for Round 1 and Round 2 Delphi participants *Participants can select more than one answer

Table 2
Percentage of indicators with median ratings indicating feasibility (time, complexity), perceived accuracy, and no unintended consequences for RE-AIM

Table 3
Indicators and their definitions and measures included in online toolkit (et.cdpp.ca)